note

Didn't mean to generalize NN here. Just plow through this 400>25>10 setup to get the feeling of NN



In [1]:

    
%reload_ext autoreload
%autoreload 2

import sys
sys.path.append('..')

from helper import nn
from helper import logistic_regression as lr
import numpy as np

prepare data



In [2]:

    
X_raw, y_raw = nn.load_data('ex4data1.mat', transpose=False)
X = np.insert(X_raw, 0, np.ones(X_raw.shape[0]), axis=1)
X.shape









    Out[2]:





(5000, 401)



In [3]:

    
y_raw









    Out[3]:





array([10, 10, 10, ...,  9,  9,  9], dtype=uint8)



In [4]:

    
y = nn.expand_y(y_raw)
y









    Out[4]:





array([[ 0.,  0.,  0., ...,  0.,  0.,  1.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.]])

load weight



In [5]:

    
t1, t2 = nn.load_weight('ex4weights.mat')
t1.shape, t2.shape









    Out[5]:





((25, 401), (10, 26))



In [6]:

    
theta = nn.serialize(t1, t2)  # flatten params
theta.shape









    Out[6]:





(10285,)

feed forward

(400 + 1) -> (25 + 1) -> (10)



In [7]:

    
_, _, _, _, h = nn.feed_forward(theta, X)
h # 5000*10









    Out[7]:





array([[  1.12661530e-04,   1.74127856e-03,   2.52696959e-03, ...,
          4.01468105e-04,   6.48072305e-03,   9.95734012e-01],
       [  4.79026796e-04,   2.41495958e-03,   3.44755685e-03, ...,
          2.39107046e-03,   1.97025086e-03,   9.95696931e-01],
       [  8.85702310e-05,   3.24266731e-03,   2.55419797e-02, ...,
          6.22892325e-02,   5.49803551e-03,   9.28008397e-01],
       ..., 
       [  5.17641791e-02,   3.81715020e-03,   2.96297510e-02, ...,
          2.15667361e-03,   6.49826950e-01,   2.42384687e-05],
       [  8.30631310e-04,   6.22003774e-04,   3.14518512e-04, ...,
          1.19366192e-02,   9.71410499e-01,   2.06173648e-04],
       [  4.81465717e-05,   4.58821829e-04,   2.15146201e-05, ...,
          5.73434571e-03,   6.96288990e-01,   8.18576980e-02]])

cost function

think about this, now we have $y$ and $h_{\theta} \in R^{5000 \times 10}$
If you just ignore the m and k dimention, pairwisely this computation is trivial.
the eqation $= y*log(h_{\theta}) - (1-y)*log(1-h_{\theta})$
all you need to do after pairwise computation is sums this 2d array up and divided by m



In [8]:

    
nn.cost(theta, X, y)









    Out[8]:





0.28762916516131892

regularized cost function

the first column of t1 and t2 is intercept $\theta$, just forget them when you do regularization



In [9]:

    
nn.regularized_cost(theta, X, y)









    Out[9]:





0.38376985909092365



In [ ]: